An Ising Model Approach to Malware Epidemiology 
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Abstract — We introduce an Ising approacli to study the 
spread of malware. Tlie Ising spins up and down are used 
to represent two states-online and offline-of tlie nodes in the 
network. Malware is allowed to propagate amongst online 
nodes and the rate of propagation was found to increase 
with data traffic. For a more efficient network, the spread 
of infection is much slower; while for a congested network, 
infection spreads quickly. 

Keywords -comput&c networks; computer viruses; epidemio- 
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I. Introduction 

The internet has become a near indispensable tool with 
both private individuals and organizations becoming in- 
creasingly dependent on internet-based software services, 
downloadable resources like books and movies, online shop- 
ping and banking, and even social networking sites. The 
issue of network security has become significant due to the 
prevalence of software with malicious or fraudulent intent. 
Malware is the general term given to a broad range of 
software including viruses and worms designed to infiltrate 
a computer system without the owner's permission |1][2J. 
Cohen's conclusion in his 1987 paper that computer viruses 
are potentially a severe threat to computer systems |3| is 
still valid in real networks today |2||4||5|. Current secu- 
rity systems do little to control the spread of malicious 
content throughout an entire network f4lf6l. Most security 
systems are designed to protect a single computer unit. These 
properly protected units make up only a fraction of online 
computers. These highlight the necessity of examining the 
dynamics of the spread of malware in order to be able to 
develop proper control strategies. 

Studies on the spread of malware in computer networks 
date back to the late 1980s |7| and are generally based 
on the mathematical approach to the spread of diseases in 
biological populations. Math models developed for spread 
of malware within a computer network such as the Kephart- 
White model and other models adapted from it are based 
on the Kermack-McKendrick model. These models have 
an implicit assumption that all nodes in the network are 
always available for "contact" (|4l||9l- However, it is a basic 
limitation of malware that it can only be passed on to another 
computer if there is a path through which information can be 



passed O, so the states of the nodes of the network- whether 
they are online or offline-have an effect on the dynamics of 
the spread. 

In this work, we model the spread of malware utilizing 
an Ising system to represent an isolated computer network. 
The state of each node is a composite of its connection status 
and health. The spin state of a node defines its connection 
status to be either online or offline. Connections are estab- 
lished with the premise that autonomous networks configure 
themselves lITTl . The health status describes whether a node 
has been infected or not, and infection can propagate only 
among online nodes. 

The Ising model was originally intended for simulating the 
magnetic domains of ferromagnetic materials. Its versatility 
has allowed it to be applied to other systems wherein 
the behavior of individuals are affected by their neighbors 
fSl flOl |TT1 . It has been applied to networks and network-like 
systems 1101 such as neural networks LSJUJL cooperation 
in social networks, and analysing trust in a peer-to-peer 
computer network ifTTl . 

II. The model 

A computer network is modeled by an A^ x A^ Ising 
spin system. Associated with each node is a spin Si^j 
corresponding to two possible states: +1 for online and —1 
for offline. The local interaction energy is given by 



E, 



i'M.-i / , ^ nearest ■ 
neighbors 



(1) 



The interaction parameter, J^j, determines the degree and 
type of dependence of s^ j on its neighbors. The nearest 
neighbors or local neighborhood are defined according to the 
network topology and are usually Von Neumann or Moore 
neighborhoods f T2llfT3l . Summing up all local energies 
gives the total energy, E, of the system. Global energy, 
E, is associated with network efficiency and more efficient 
networks are characterized by lower energies. 

Note that while interaction energies are explicitly depen- 
dent on the nearest neighbors, the state of each node is 
implicitly dependent on the state of the entire system. A node 
will change its configuration provided that the new energy of 



the system is lower than the previous. If the resulting energy 
is higher, the new configuration is accepted with probability 
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In the standard Ising procedure, A£' is the change in energy, 
T is temperature, and ks is the Boltzmann constant. Here, 
T relates to network traffic. 

To model the spread of infection, each node is assigned 
a health status separate from its spin. The health status is 
either infected or susceptible. Every online susceptible has 
a probability Pinf of becoming infected, where 



Pinf — 



number of online infective nodes 



(3) 



number of online nodes 
Offline nodes do not transmit or receive data. Hence, they 
do not participate in the infection part. 

Program Specifics: The computer network is a 10 x 10 
lattice. Nearest neighbors are defined to be the four adjacent 
nodes. The interaction parameters are all set to Jij = J = 
+1. EqlT] becomes 
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((■'j+ij + Si-i,'j + Si.j+i + Sij-i)- (4) 



For the interaction energy calculations, circular boundary 
conditions are imposed. Parameters are scaled such that 
ks = 1- Initially, all nodes are offline (s^ j = —1). 
Every time step, the entire system is swept in a left-to-right 
top-to-bottom fashion, evaluating each node for a possible 
change in state. The mean energy per node (-E^.j) of each 
configuration is stored and averaged at the end of the run. 
The spread of infection begins with a single infective. 
At t = 0, one node is selected at random and infected. 
As the infection spreads, the number of susceptibles, S{t), 
and infectives, I{t), for each time step are stored. Because 
no means for removal of infection is provided, all nodes 
eventually become infected. It is at this time that the program 
is terminated. 

III. Analysis of results 

The model was tested for T- values ranging from T = 1.25 
to T = 11.25. The infection curves of five trials were 
averaged for each T. The average infection curve was 
normalized by dividing it by the total number of nodes to 
get the fraction of infectives i{t). Because it can no longer 
be assumed that nodes are always available for connection, 
a regular decay equation is used to model the fraction of 
infectives curve. 

A system with NxN nodes has S{t) susceptibles and I{t) 
infectives at time t. Within the time-frame dt, the number of 
susceptibles being converted to infectives is dS{t). As time 
passes, dS{t) decreases as the population of susceptibles 
is exhausted. Thus, the probability of conversion, given by 

dSjt) 




time iterations, k 



Figure 1 . Comparison of Infection Curves for Selected T during the first 
50 iterations: The rate of spread of the infection increases with T. For the 
above graphs, the resulting decay constants are:/3(T = 1.25) = 0.000150, 
^(T = 2.50) = 0.003118, /3{T = 3.50) = 0.016776, I3{T = 5.50) = 
0.114230, /3(T = 10.00) = 0.214304, and /3(T = 11.25) = 0.215791 
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where /3 is the decay constant. The solution to Eq|5]is 

S{t) = Soe-P' 

where ^o, the initial number of susceptibles, is just the total 
number of units in the system. Using these, the expression 
for the number of infectives, I{t) may be written as 

This may be normalized to 



i{t) = 1 



-/3t 



(6) 



S{t) 



decreases with time. In equation form, this is 



Note that the actual rate of spread varies with time, and (3 
provides a measure of the average rate of spread. 

The fits were made using the unweighted Levenberg- 
Marquardt algorithm of Gnuplot ver4.2 [14] initialized with 
(3 ~ 0.1. For consistency, because some runs terminate very 
rapidly, we consider only the first 50 time-steps. 

From Figlll it appears that the spread of infection be- 
comes faster as T increases. For T — 1.25 and T = 2.50, 
the rates of spread are very slow, neither reaching 50%- 
infected at the last iteration. Particularly, for T — 1.25, no 
new infectives were produced. These low-traffic systems are 
not dynamic as nodes have a low probability of coming 
online from their initial offline state. The network is also 
very efficient, {Eij{T = 1.25)) = -4.00 and {Eij{T = 
2.50)) = —3.57, which may be interpreted as information 
exchange being limited to necessary transactions. For this 
reason, there is little information exchange and hence a slow 
spread. For very high T, as in T = 10.00 and T = 11.25, 
the spread is rapid and nearly 100% infection is reached. 
This suggests that very high traffic means a large volume 
of information exchange that leads to a faster spread of 




Figure 2. T-Dependence of Rates: The increase in the rate of infection 
corresponds with the decrease in efficiency in the network. Note that E- 
values are negative. 



infection. The system is also inefficient at very high T, with 
(i?ij(r= 11.25)) = -0.76. It is worth mentioning that the 
average infection curves of T= 10.00 and r= 11.25 nearly 
coincide indicating rates of spread that are very similar 

The observations are supported by the calculated decay 
constants. The calculated /? initially increases with traffic 
but is capped off at very high T where it becomes constant. 
This behavior is similar to the saturation region in a traffic 
network where flux saturates at high densities. The saturation 
region indicates that information exchange is no longer 
freely flowing and that some kind of congestion has occurred 
ifTSl . In Fig. 121 there is an evident transition that occurs in 
both the average rate of spread and the efficiency of the 
network. At the "congested" region, the efficiency of the 
network is very low; while at the "free flow" region, the 
efficiency of the network is comparatively high. Congestion 
occurs because networks can only handle a limited amount 
of traffic-in the form of data packets. When there is too 
much traffic, the network is forced to store or drop packets 
making it inefficient |15||16|. An increase in packet loss 
with increasing data traffic is reflected by the decrease in 
efficiency at the congestion region. The congestion is most 
likely a result of the limited size of the network and the 
"finite-size effect" may be confirmed by testing a larger 
networkllTSI. 

IV. Conclusion 

Our Ising model approach accounts for the connection 
status of nodes in an infected network. Unlike most epidemic 
models where all nodes are assumed to be always connected, 
the model allows malware propagation only among online 
nodes. We found that the rate of infection becomes faster in 
less efficient networks with higher data traffic and saturates 
as the network becomes congested. 
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